Vienna
Interdisciplinary Research in Conversation: A Case Study in Computational Morphology for Language Documentation
Rice, Enora, von der Wense, Katharina, Palmer, Alexis
Computational morphology has the potential to support language documentation through tasks like morphological segmentation and the generation of Interlinear Glossed Text (IGT). However, our research outputs have seen limited use in real-world language documentation settings. This position paper situates the disconnect between computational morphology and language documentation within a broader misalignment between research and practice in NLP and argues that the field risks becoming decontextualized and ineffectual without systematic integration of User-Centered Design (UCD). To demonstrate how principles from UCD can reshape the research agenda, we present a case study of GlossLM, a state-of-the-art multilingual IGT generation model. Through a small-scale user study with three documentary linguists, we find that despite strong metric based performance, the system fails to meet core usability needs in real documentation contexts. These insights raise new research questions around model constraints, label standardization, segmentation, and personalization. We argue that centering users not only produces more effective tools, but surfaces richer, more relevant research directions
- North America > United States > Colorado > Boulder County > Boulder (0.28)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Asia > Thailand > Bangkok > Bangkok (0.05)
- (23 more...)
- Questionnaire & Opinion Survey (1.00)
- Research Report > New Finding (0.88)
- Research Report > Experimental Study (0.66)
GLiNER2: An Efficient Multi-Task Information Extraction System with Schema-Driven Interface
Zaratiana, Urchade, Pasternak, Gil, Boyd, Oliver, Hurn-Maloney, George, Lewis, Ash
Information extraction (IE) is fundamental to numerous NLP applications, yet existing solutions often require specialized models for different tasks or rely on computationally expensive large language models. We present GLiNER2, a unified framework that enhances the original GLiNER architecture to support named entity recognition, text classification, and hierarchical structured data extraction within a single efficient model. Built pretrained transformer encoder architecture, GLiNER2 maintains CPU efficiency and compact size while introducing multi-task composition through an intuitive schema-based interface. Our experiments demonstrate competitive performance across extraction and classification tasks with substantial improvements in deployment accessibility compared to LLM-based alternatives. We release GLiNER2 as an open-source pip-installable library with pre-trained models and documentation at https://github.com/fastino-ai/GLiNER2.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.05)
- North America > United States > Virginia > Fairfax County > Vienna (0.04)
- (7 more...)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)
Suspected Undeclared Use of Artificial Intelligence in the Academic Literature: An Analysis of the Academ-AI Dataset
Since generative artificial intelligence (AI) tools such as OpenAI's ChatGPT became widely available, researchers have used them in the writing process. The consensus of the academic publishing community is that such usage must be declared in the published article. Academ-AI documents examples of suspected undeclared AI usage in the academic literature, discernible primarily due to the appearance in research papers of idiosyncratic verbiage characteristic of large language model (LLM)-based chatbots. This analysis of the first 500 examples collected reveals that the problem is widespread, penetrating the journals and conference proceedings of highly respected publishers. Undeclared AI seems to appear in journals with higher citation metrics and higher article processing charges (APCs), precisely those outlets that should theoretically have the resources and expertise to avoid such oversights. An extremely small minority of cases are corrected post publication, and the corrections are often insufficient to rectify the problem. The 500 examples analyzed here likely represent a small fraction of the undeclared AI present in the academic literature, much of which may be undetectable. Publishers must enforce their policies against undeclared AI usage in cases that are detectable; this is the best defense currently available to the academic publishing community against the proliferation of undisclosed AI.
- Europe > Austria > Vienna (0.14)
- North America > United States > Virginia > Fairfax County > Vienna (0.04)
- North America > United States > Kentucky > Jefferson County > Louisville (0.04)
- (2 more...)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology (0.93)
- Energy (0.67)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.88)
GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer
Zaratiana, Urchade, Tomeh, Nadi, Holat, Pierre, Charnois, Thierry
Named Entity Recognition (NER) is essential in various Natural Language Processing (NLP) applications. Traditional NER models are effective but limited to a set of predefined entity types. In contrast, Large Language Models (LLMs) can extract arbitrary entities through natural language instructions, offering greater flexibility. However, their size and cost, particularly for those accessed via APIs like ChatGPT, make them impractical in resource-limited scenarios. In this paper, we introduce a compact NER model trained to identify any type of entity. Leveraging a bidirectional transformer encoder, our model, GLiNER, facilitates parallel entity extraction, an advantage over the slow sequential token generation of LLMs. Through comprehensive testing, GLiNER demonstrate strong performance, outperforming both ChatGPT and fine-tuned LLMs in zero-shot evaluations on various NER benchmarks.
- North America > Canada > Quebec > Montreal (0.05)
- North America > United States > Virginia > Fairfax County > Vienna (0.04)
- North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)
- (2 more...)
AI tech identifies suicide risk in military veterans before it's too late: 'Flipping the model'
U.S. Marine Corps veteran Adam Cooper is joined by Army veteran Lowell Koppert as he nears the end of his 22-hour workout and shares his'radical' pledge to bring more awareness to the issue of veteran suicides. If you or someone you know is having thoughts of suicide, please contact the Suicide & Crisis Lifeline at 988 or 1-800-273-TALK (8255). As the mental health of U.S. military veterans remains a major concern among many people in our society, new technology could become a lifesaver. An AI platform developed by ClearForce, a tech company in Vienna, Virginia, aims to identify the risk of suicide among veterans before it's too late. Col. Michael Hudson, vice president at ClearForce, spoke to Fox News Digital in an interview to discuss his efforts on the veteran suicide initiative.
- North America > United States > Virginia > Fairfax County > Vienna (0.25)
- North America > United States > California (0.05)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Government > Military (1.00)
Semantically-informed Hierarchical Event Modeling
Dipta, Shubhashis Roy, Rezaee, Mehdi, Ferraro, Francis
Prior work has shown that coupling sequential latent variable models with semantic ontological knowledge can improve the representational capabilities of event modeling approaches. In this work, we present a novel, doubly hierarchical, semi-supervised event modeling framework that provides structural hierarchy while also accounting for ontological hierarchy. Our approach consists of multiple layers of structured latent variables, where each successive layer compresses and abstracts the previous layers. We guide this compression through the injection of structured ontological knowledge that is defined at the type level of events: importantly, our model allows for partial injection of semantic knowledge and it does not depend on observing instances at any particular level of the semantic ontology. Across two different datasets and four different evaluation metrics, we demonstrate that our approach is able to out-perform the previous state-of-the-art approaches by up to 8.5%, demonstrating the benefits of structured and semantic hierarchical knowledge for event modeling.
- North America > United States > Maryland > Baltimore County (0.14)
- North America > United States > Maryland > Baltimore (0.14)
- Oceania > Australia (0.04)
- (7 more...)
- Government > Regional Government > North America Government > United States Government (0.46)
- Government > Military (0.46)
FoundationDB: A Distributed Key-Value Store
FoundationDB is an open-source transactional key-value store created more than 10 years ago. It is one of the first systems to combine the flexibility and scalability of NoSQL architectures with the power of ACID transactions. FoundationDB adopts an unbundled architecture that decouples an in-memory transaction management system, a distributed storage system, and a built-in distributed configuration system. Each sub-system can be independently provisioned and configured to achieve scalability, high availability, and fault tolerance. FoundationDB includes a deterministic simulation framework, used to test every new feature under a myriad of possible faults. This rigorous testing makes FoundationDB extremely stable and allows developers to introduce and release new features in a rapid cadence. FoundationDB offers a minimal and carefully chosen feature set, which has enabled a range of disparate systems to be built as layers on top. FoundationDB is the underpinning of cloud infrastructure at Apple, Snowflake, and other companies, due to its consistency, robustness, and availability for storing user data, system metadata and configuration, and other critical information. Many cloud services rely on scalable, distributed storage backends for persisting application state. Such storage systems must be fault tolerant and highly available, and at the same time provide sufficiently strong semantics and flexible data models to enable rapid application development. Such services must scale to billions of users, petabytes or exabytes of stored data, and millions of requests per second. More than a decade ago, NoSQL storage systems emerged offering ease of application development, making it simple to scale and operate storage systems, offering fault-tolerance and supporting a wide range of data models (instead of the traditional rigid relational model). In order to scale, these systems sacrificed transactional semantics, and instead provided eventual consistency, forcing application developers to reason about interleavings of updates from concurrent operations. FoundationDB (FDB)3 was created in 2009 and gets its name from the focus on providing what we saw as the foundational set of building blocks required to build higher-level distributed systems.
- North America > United States > California > Santa Clara County > Cupertino (0.06)
- North America > United States > California > San Mateo County > San Mateo (0.05)
- North America > United States > Virginia > Fairfax County > Vienna (0.04)
- (5 more...)
A survey on knowledge-enhanced multimodal learning
Lymperaiou, Maria, Stamou, Giorgos
Multimodal learning has been a field of increasing interest, aiming to combine various modalities in a single joint representation. Especially in the area of visiolinguistic (VL) learning multiple models and techniques have been developed, targeting a variety of tasks that involve images and text. VL models have reached unprecedented performances by extending the idea of Transformers, so that both modalities can learn from each other. Massive pre-training procedures enable VL models to acquire a certain level of real-world understanding, although many gaps can be identified: the limited comprehension of commonsense, factual, temporal and other everyday knowledge aspects questions the extendability of VL tasks. Knowledge graphs and other knowledge sources can fill those gaps by explicitly providing missing information, unlocking novel capabilities of VL models. In the same time, knowledge graphs enhance explainability, fairness and validity of decision making, issues of outermost importance for such complex implementations. The current survey aims to unify the fields of VL representation learning and knowledge graphs, and provides a taxonomy and analysis of knowledge-enhanced VL models.
- North America > United States > New York > New York County > New York City (0.14)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Africa (0.04)
- (9 more...)
- Education (0.92)
- Health & Medicine (0.67)
Converting Laws to Programs
You would think something as numerical as income tax law would be similar to mathematical logic, but it is not, Protzenko says, because it is not written with the precision and clarity that would "make it amenable to a very mathematical reading of it." For example, that law does not mention a number may need to be rounded into whole cents. "The law won't tell you what you're supposed to do with rounding numbers and that can lead to ambiguity and a lack of specification of what's supposed to happen," he says. Healthcare law is also very complex. Faisal Khan, senior legal counsel at healthcare law firm Nixon Gwilt Law in Vienna, VA, says, "Software for HIPAA compliance must incorporate algorithms that target and hit on all the top-level statutory requirements and implementing regulations.' To make that happen, Khan says, "There must be a team of compliance-related input as many of the regulations essentially function as guidelines for companies to adhere to." That means a process or ...
- North America > United States > Virginia > Fairfax County > Vienna (0.25)
- North America > United States > Ohio (0.06)
- North America > United States > Washington > King County > Seattle (0.05)
- (2 more...)
- Law > Taxation Law (1.00)
- Health & Medicine (1.00)
- Government > Tax (1.00)
- Government > Regional Government > North America Government > United States Government (0.96)
Top 10 Cognitive Computing Startups Situated in India in 2021
Cognitive computing registering is the utilization of automated models to reproduce the human perspective in complex circumstances where the appropriate responses might be vague and questionable. The expression is firmly connected with IBM's intellectual PC framework, Watson. Intellectual figuring is covered with AI and includes a significant number of similar hidden advancements to control intellectual applications, including master frameworks, neural organizations, mechanical technology, and virtual reality (VR). Marlabs Inc is a digital firm that has offices in Piscataway, N.J., and Bangalore, India. Founded in 1996, the company's 1,500 employees have over two decades of experience in CRM consulting, SI and big data consulting, and SI.
- Asia > India > Karnataka > Bengaluru (0.30)
- North America > United States > New Jersey > Middlesex County > Piscataway (0.25)
- Europe > United Kingdom (0.16)
- (12 more...)
- Information Technology (0.51)
- Health & Medicine (0.31)
- Banking & Finance (0.31)